Accelerating the Design of High-Energy-Density Hydrocarbon Fuels by Learning from the Data

In the ZINC20 database, with the aid of maximum substructure searches, common substructures were obtained from molecules with high-strain-energy and combustion heat values, and further provided domain knowledge on how to design high-energy-density hydrocarbon (HEDH) fuels. Notably, quadricyclane and syntin could be topologically assembled through these substructures, and the corresponding assembled schemes guided the design of 20 fuel molecules (ZD-1 to ZD-20). The fuel properties of the molecules were evaluated by using group-contribution methods and density functional theory (DFT) calculations, where ZD-6 stood out due to the high volumetric net heat of combustion, high specific impulse, low melting point, and acceptable flash point. Based on the neural network model for evaluating the synthetic complexity (SCScore), the estimated value of ZD-6 was close to that of syntin, indicating that the synthetic complexity of ZD-6 was comparable to that of syntin. This work not only provides ZD-6 as a potential HEDH fuel, but also illustrates the superiority of learning design strategies from the data in increasing the understanding of structure and performance relationships and accelerating the development of novel HEDH fuels.


Introduction
As the key component in engines, high-energy-density hydrocarbon (HEDH) fuels have always been the subjects of scientific concern [1,2], among which many qualified fuels stand out, such as exo-tetrahydrodicyclopentadiene (JP-10), quadricyclane (QC), and so on [3][4][5].In pursuit of higher payloads, HEDH fuels must meet the required properties, such as high density, low melting point, high flashing point, high specific impulse [6].To this end, successive researchers have devoted themselves to this field and are committed to designing and synthesizing new HEDH fuels [7][8][9], where cyclic hydrocarbons with a high stain energy usually possess a high density and high gravimetric net heat of combustion [10,11].Compared to the traditional trial-and-error approaches, computational modeling and design has been established as an important way to accelerate the discovery of new materials with the rapid growth of computing power, by using highthroughput computations, machine learning, and quantum chemistry calculations [12][13][14][15][16][17][18][19][20][21].Moreover, Xiao's groups investigated the pentacyclo[5.4.0.0 2,6 .0 3,10.0 5,9]undecane (PCU) and its hydrocarbon derivatives, and expressed its promising application as an HEDH fuel through DFT studies [22].Namboothiri's group designed 1,4-disubstituted cubane derivatives and synthesized the desired molecules [23].Moreover, Parakhin constructed a series of N,N -methylene-bridged polynitro hexaazaisowurtzitane molecules and found the specific impulse higher than CL-20 [24].Recently, Li and co-workers proposed an effective machine learning (ML) method enabling the high-throughput screening of next-generation fuels from the hydrocarbon subset of GDB-13 and further continued the idea with more advanced methods [25][26][27].These approaches furnished the ways for designing novel fuels and offered some high-performance candidates.However, considering the "black box" of the ML models, it may lack some inherent information for the current fuels by collapsing multiple matrices.Therefore, how to grasp the characteristics of the existing fuels and provide further guidance for the rational design of new HEDH fuels is particularly important and attractive.
In recent years, the amount of data in various industries has grown exponentially; therefore, many researchers collected these data and built specialized databases, for example, ZINC20 [28] and Materials Project [29,30].On the basis of these data, plenty of groups applied data mining approaches to exact the relation between performance and the specific structural fingerprint, elemental composition, and learned design rules for polymer dielectrics [31], transparent conducting materials [32], and so on [33].These works inspired us and offered an alternative method of designing novel HEDH fuels through learning from the data.
Herein, as depicted in Figure 1, high-throughput screening is employed to obtain the molecules with potential as HEDH fuels from the ZINC20 database, and the structural features are mined by a statistical analysis as well as the maximum substructure search approach to improve the domain knowledge.More importantly, the topologically assembled schemes of syntin and QC using the common substructures of the above-screened candidates are excavated and combined with the combinatorial design strategy to further guide the design of 20 fuels.Ultimately, these designed fuels are evaluated with groupcontribution methods and DFT calculations, all exhibiting higher volumetric net heat values of combustion than QC, illustrating the potential applications for HEDH fuels and the effectiveness of the proposed approach.
(PCU) and its hydrocarbon derivatives, and expressed its promising application as an HEDH fuel through DFT studies [22].Namboothiri's group designed 1,4-disubstituted cubane derivatives and synthesized the desired molecules [23].Moreover, Parakhin constructed a series of N,N′-methylene-bridged polynitro hexaazaisowur itane molecules and found the specific impulse higher than CL-20 [24].Recently, Li and co-workers proposed an effective machine learning (ML) method enabling the high-throughput screening of next-generation fuels from the hydrocarbon subset of GDB-13 and further continued the idea with more advanced methods [25][26][27].These approaches furnished the ways for designing novel fuels and offered some high-performance candidates.However, considering the "black box" of the ML models, it may lack some inherent information for the current fuels by collapsing multiple matrices.Therefore, how to grasp the characteristics of the existing fuels and provide further guidance for the rational design of new HEDH fuels is particularly important and a ractive.
In recent years, the amount of data in various industries has grown exponentially; therefore, many researchers collected these data and built specialized databases, for example, ZINC20 [28] and Materials Project [29,30].On the basis of these data, plenty of groups applied data mining approaches to exact the relation between performance and the specific structural fingerprint, elemental composition, and learned design rules for polymer dielectrics [31], transparent conducting materials [32], and so on [33].These works inspired us and offered an alternative method of designing novel HEDH fuels through learning from the data.
Herein, as depicted in Figure 1, high-throughput screening is employed to obtain the molecules with potential as HEDH fuels from the ZINC20 database, and the structural features are mined by a statistical analysis as well as the maximum substructure search approach to improve the domain knowledge.More importantly, the topologically assembled schemes of syntin and QC using the common substructures of the above-screened candidates are excavated and combined with the combinatorial design strategy to further guide the design of 20 fuels.Ultimately, these designed fuels are evaluated with groupcontribution methods and DFT calculations, all exhibiting higher volumetric net heat values of combustion than QC, illustrating the potential applications for HEDH fuels and the effectiveness of the proposed approach.

Results and Discussion
Since strained cyclic hydrocarbons release more energy compared to chain-structured fuels [34][35][36], more a ention was given to features of cyclic hydrocarbons in the current work.Figure 2a shows the statistical distribution of molecule types presenting different numbers of cyclic structures, where only 12.3% of the molecules lack cyclic structures, while most of the molecules (over 60%) have 1~3 cyclic structures.Meanwhile, the proportion of corresponding molecule types decreases sharply, with the number of cyclic structures in the molecules increasing.This may be related to the increased difficulty of synthesis caused by the increase in the number of intermolecular cyclic structures, which is further manifested in a much lower number of corresponding molecules with more cyclic structures.Having analyzed the distribution of molecule types with different numbers of cyclic structures, our a ention was turned to figuring out the distribution of different types of cyclic structures in different molecule types.It can be seen from Figure 2b that

Results and Discussion
Since strained cyclic hydrocarbons release more energy compared to chain-structured fuels [34][35][36], more attention was given to features of cyclic hydrocarbons in the current work.Figure 2a shows the statistical distribution of molecule types presenting different numbers of cyclic structures, where only 12.3% of the molecules lack cyclic structures, while most of the molecules (over 60%) have 1~3 cyclic structures.Meanwhile, the proportion of corresponding molecule types decreases sharply, with the number of cyclic structures in the molecules increasing.This may be related to the increased difficulty of synthesis caused by the increase in the number of intermolecular cyclic structures, which is further manifested in a much lower number of corresponding molecules with more cyclic structures.Having analyzed the distribution of molecule types with different numbers of cyclic structures, our attention was turned to figuring out the distribution of different types of cyclic structures in different molecule types.It can be seen from Figure 2b that three-, four-, five-, and six-membered cyclic structures account for an overwhelming majority in the entire cyclic structure, which can be ascribed to the relatively easier synthesis.Moreover, the sixmembered cyclic structures are dominant when the number of cyclic structures less than eight, while the five-membered structures hold the major proportion when the number of cyclic structures is higher than eight.three-, four-, five-, and six-membered cyclic structures account for an overwhelming majority in the entire cyclic structure, which can be ascribed to the relatively easier synthesis.Moreover, the six-membered cyclic structures are dominant when the number of cyclic structures less than eight, while the five-membered structures hold the major proportion when the number of cyclic structures is higher than eight.Considering the abovementioned structural analysis and the role of cyclic structures in fuel molecules [37], the strain energy, heat of combustion, and SCScore were all quantified for the cyclic molecules, and their respective distributions in molecules containing different numbers of cyclic structures are illustrated in Figure 3.As can be seen in Figure 3a, the strain energy shows an upward trend as the number of ring structures in the molecule increases to six.Considering the inherent high strain energy of three-and four-membered cyclic structures (29.0 and 26.3 kcal/mol) [38,39], this trend can be explained from the observation in Figure 2b.In detail, the proportion of strained three-and four-membered cyclic structures among the corresponding molecule types increases until the molecule contains six cyclic structures.Specifically, the molecules with relatively high strain energies actually hold the strained three-or four-membered cyclic structures, as shown in Figure 3a, which is in good agreement with the abovementioned analysis.Figure 3b illustrates the heat of combustion distribution among the molecules containing different numbers of cyclic structures.It was found that the heat of combustion decreased with the increasing number of cyclic structures in the molecules.This was because the gain in the enthalpy of formation from the increase in the number of cyclic structures was not as great as the increase in the molar mass in the denominator term (see Equation (S14) in Supplementary Materials).Bicyclo[1.1.0]butanepossessed the highest heat of combustion value, showing the superiority of the fused two three-membered cyclic structure in improving the heat of combustion and providing an effective way to design high-heat-of-combustion fuels [37].Then, the SCScore distribution among the molecules containing different numbers of cyclic structures is illustrated in Figure 3c.It is worthwhile mentioning that the synthetic complexity of the molecule was related to the number of cyclic structures in itself, roughly showing the rule that the lower the number of cyclic structures in the molecules, the easier it is to be synthesized.Additionally, the two molecules shown at the top of Figure 3c also illustrate that the long-chain and stereoscopic structures lead to a relatively higher synthetic complexity.In order to further learn the guidance for HEDH fuel designs, 13 molecules with a strain energy over 300 kJ/mol and heat of combustion value over 45 MJ/kg were selected, as shown in Figure 3d, and the corresponding values are Considering the abovementioned structural analysis and the role of cyclic structures in fuel molecules [37], the strain energy, heat of combustion, and SCScore were all quantified for the cyclic molecules, and their respective distributions in molecules containing different numbers of cyclic structures are illustrated in Figure 3.As can be seen in Figure 3a, the strain energy shows an upward trend as the number of ring structures in the molecule increases to six.Considering the inherent high strain energy of three-and four-membered cyclic structures (29.0 and 26.3 kcal/mol) [38,39], this trend can be explained from the observation in Figure 2b.In detail, the proportion of strained three-and four-membered cyclic structures among the corresponding molecule types increases until the molecule contains six cyclic structures.Specifically, the molecules with relatively high strain energies actually hold the strained three-or four-membered cyclic structures, as shown in Figure 3a, which is in good agreement with the abovementioned analysis.Figure 3b illustrates the heat of combustion distribution among the molecules containing different numbers of cyclic structures.It was found that the heat of combustion decreased with the increasing number of cyclic structures in the molecules.This was because the gain in the enthalpy of formation from the increase in the number of cyclic structures was not as great as the increase in the molar mass in the denominator term (see Equation (S14) in Supplementary Materials).Bicyclo[1.1.0]butanepossessed the highest heat of combustion value, showing the superiority of the fused two three-membered cyclic structure in improving the heat of combustion and providing an effective way to design high-heat-of-combustion fuels [37].Then, the SCScore distribution among the molecules containing different numbers of cyclic structures is illustrated in Figure 3c.It is worthwhile mentioning that the synthetic complexity of the molecule was related to the number of cyclic structures in itself, roughly showing the rule that the lower the number of cyclic structures in the molecules, the easier it is to be synthesized.Additionally, the two molecules shown at the top of Figure 3c also illustrate that the long-chain and stereoscopic structures lead to a relatively higher synthetic complexity.In order to further learn the guidance for HEDH fuel designs, 13 molecules with a strain energy over 300 kJ/mol and heat of combustion value over 45 MJ/kg were selected, as shown in Figure 3d, and the corresponding values are listed in Table S1.These molecules possessed almost fused cyclic or caged structures, which were proposed as an effective way to improve the energy density of fuels [37].
listed in Table S1.These molecules possessed almost fused cyclic or caged structures, which were proposed as an effective way to improve the energy density of fuels [37].Among the selected molecules, the maximum common substructure search approach was applied to obtain the structural features, and the top-five most frequent max-substructures of these molecules (labeled ZM-1 to ZM-5) are depicted in Figure 4a.They almost have strained three-or four-membered cyclic structures, as well as fused structures, indicating their advantages in building HEDH fuels.Figure 4b shows the different structural topological assembled schemes for QC and syntin fuels.It was found that the QC could be topologically assembled from one ZM-5 and two ZM-2 structures by sharing bond and bridging processes.In detail, the intermediate structure was constructed with two ZM-2 structures fused with ZM-5 (by sharing bonds colored in purple), which finally bridged the orange sites to form a caged QC topologically.Additionally, syntin could be generated from three ZM-2 structures; specifically, the two ZM-2 structures in turn bridged the orange sites of the initial ZM-2 structure.These two assembled schemes learned from the topological structures provided subsequent design rules.It was noteworthy that the QC-assembled scheme formed the caged structures, probably leading to an increase in the synthetic complexity.Since ZM-4 possessed a relatively complex stereoscopic structure, considering the synthetic complexity, the syntin-assembled scheme was employed for ZM-4, while the QC-assembled scheme was applied to two other max-substructures (ZM-1 and ZM-3).Since QC was constructed using a combination of ZM-5 and two ZM-2 structures, and applying the syntin-assembly scheme to ZM-5 would have increased the difficulty of synthesis, ZM-5 was not considered in the subsequent design.Among the selected molecules, the maximum common substructure search approach was applied to obtain the structural features, and the top-five most frequent max-substructures of these molecules (labeled ZM-1 to ZM-5) are depicted in Figure 4a.They almost have strained three-or four-membered cyclic structures, as well as fused structures, indicating their advantages in building HEDH fuels.Figure 4b shows the different structural topological assembled schemes for QC and syntin fuels.It was found that the QC could be topologically assembled from one ZM-5 and two ZM-2 structures by sharing bond and bridging processes.In detail, the intermediate structure was constructed with two ZM-2 structures fused with ZM-5 (by sharing bonds colored in purple), which finally bridged the orange sites to form a caged QC topologically.Additionally, syntin could be generated from three ZM-2 structures; specifically, the two ZM-2 structures in turn bridged the orange sites of the initial ZM-2 structure.These two assembled schemes learned from the topological structures provided subsequent design rules.It was noteworthy that the QC-assembled scheme formed the caged structures, probably leading to an increase in the synthetic complexity.Since ZM-4 possessed a relatively complex stereoscopic structure, considering the synthetic complexity, the syntin-assembled scheme was employed for ZM-4, while the QC-assembled scheme was applied to two other max-substructures (ZM-1 and ZM-3).Since QC was constructed using a combination of ZM-5 and two ZM-2 structures, and applying the syntin-assembly scheme to ZM-5 would have increased the difficulty of synthesis, ZM-5 was not considered in the subsequent design.Through removing duplicate structures, 20 designed fuels (labeled ZD-1 to ZD-20) from the proposed assembled schemes are shown in Figure 4c.
Through removing duplicate structures, 20 designed fuels (labeled ZD-1 to ZD-20) from the proposed assembled schemes are shown in Figure 4c.Furthermore, Figure 5 manifests the value of the strain energy, volumetric net heat of combustion (V-NHOC), and SCScore of the designed fuels after the calculations of the group-contribution methods and DFT.The great majority of the designed fuels possessed higher strain energies than the QC and syntin, which could be ascribed to the more strained three-membered cyclic and fused stereoscopic structures in the molecules.It is worth mentioning that ZD-1 was constructed by ZM-1 and two ZM-2 structured through the QC-assembled scheme.In other words, ZD-1 had a cage-like structure fused by the four three-membered cyclic structures, thus endowing it with a strain energy exceeding 700 kJ/mol.Moreover, Table 1 lists the estimated properties after the DFT calculations and Glushko's formula, including the density, net heat of combustion (NHOC), melting point (Tm), flash point (FP), and Isp.The designed fuels from ZD-1 to ZD-10 with the QC-assembled scheme had a higher density than the other fuels generated with the syntin-assembled scheme, which could be a ributed to the additional fused structures in the structure generation [37].Meanwhile, since V-NHOC was equal to the density multiplied by the NHOC, the higher density and NHOC endowed a decent V-NHOC value, as shown in Figure 5, where the V-NHOCs of the design fuels were higher than that of the QC, showing the effectiveness of the design strategy for HEDH fuels.Considering the application in liquid fuels, the designed fuels (including ZD-8 and ZD-11 to ZD-17) with Tm higher than 273.15K were not suitable [25].Moreover, the FP value of designed fuels representing the lowest temperature at which the vapor of fuels ignites while in contact with a flame would be be er at temperatures higher than 38 °C [40], and several designed fuels met this requirement.Further taking the values of specific impulses and synthetic complexities into account, ZD-6 stood out due to the Isp with a value of 358.5 s and an acceptable SCScore value of 2.64, showing the potential for its application as an HEDH fuel.Furthermore, Figure 5 manifests the value of the strain energy, volumetric net heat of combustion (V-NHOC), and SCScore of the designed fuels after the calculations of the group-contribution methods and DFT.The great majority of the designed fuels possessed higher strain energies than the QC and syntin, which could be ascribed to the more strained three-membered cyclic and fused stereoscopic structures in the molecules.It is worth mentioning that ZD-1 was constructed by ZM-1 and two ZM-2 structured through the QC-assembled scheme.In other words, ZD-1 had a cage-like structure fused by the four three-membered cyclic structures, thus endowing it with a strain energy exceeding 700 kJ/mol.Moreover, Table 1 lists the estimated properties after the DFT calculations and Glushko's formula, including the density, net heat of combustion (NHOC), melting point (T m ), flash point (FP), and I sp .The designed fuels from ZD-1 to ZD-10 with the QC-assembled scheme had a higher density than the other fuels generated with the syntinassembled scheme, which could be attributed to the additional fused structures in the structure generation [37].Meanwhile, since V-NHOC was equal to the density multiplied by the NHOC, the higher density and NHOC endowed a decent V-NHOC value, as shown in Figure 5, where the V-NHOCs of the design fuels were higher than that of the QC, showing the effectiveness of the design strategy for HEDH fuels.Considering the application in liquid fuels, the designed fuels (including ZD-8 and ZD-11 to ZD-17) with T m higher than 273.15K were not suitable [25].Moreover, the FP value of designed fuels representing the lowest temperature at which the vapor of fuels ignites while in contact with a flame would be better at temperatures higher than 38 • C [40], and several designed fuels met this requirement.Further taking the values of specific impulses and synthetic complexities into account, ZD-6 stood out due to the I sp with a value of 358.5 s and an acceptable SCScore value of 2.64, showing the potential for its application as an HEDH fuel.

Figure 1 .
Figure 1.Illustration of the whole workflow.

Figure 1 .
Figure 1.Illustration of the whole workflow.

Figure 2 .
Figure 2. (a) Distribution map of the types of molecules with different numbers of cyclic structures after screening; (b) distribution map of type of cyclic structures in different types of molecules after screening.

Figure 2 .
Figure 2. (a) Distribution map of the types of molecules with different numbers of cyclic structures after screening; (b) distribution map of type of cyclic structures in different types of molecules after screening.

Figure 3 .
Figure 3. (a) Strain energy distribution, (b) heat of combustion distribution, (c) SCScore distribution among molecules containing different numbers of cyclic structures; (d) structures of selected molecules.The colors are consistent with the molecular classes in Figure 2a.

Figure 3 .
Figure 3. (a) Strain energy distribution, (b) heat of combustion distribution, (c) SCScore distribution among molecules containing different numbers of cyclic structures; (d) structures of selected molecules.The colors are consistent with the molecular classes in Figure 2a.

Figure 4 .
Figure 4. (a) Demonstration for the max-substructures of the selected molecules; (b) diagram of the two assembled schemes for QC and syntin using the max-substructures, where purple bonds represent the sharing bond and orange circles represent the bridging positions for generating structures; and the 1-3 represent the steps of assembly; (c) illustration of designed fuel molecules.

Figure 4 .
Figure 4. (a) Demonstration for the max-substructures of the selected molecules; (b) diagram of the two assembled schemes for QC and syntin using the max-substructures, where purple bonds represent the sharing bond and orange circles represent the bridging positions for generating structures; and the 1-3 represent the steps of assembly; (c) illustration of designed fuel molecules.

Figure 5 .
Figure 5. Diagram of strain energy, volumetric net heat of combustion, and SCScore of designed fuels.

Table 1 .
Estimated properties of designed fuels as well as syntin and QC.

Table 1 .
Estimated properties of designed fuels as well as syntin and QC.